Creating Incentives for Honest Rating Submission by Rating the Raters

نویسنده

  • Jörg H. Lepler
چکیده

Reputations of electronic parties are important for their economic viability in business settings. Potential clients of e-services cannot rely solely on their own experiences to gain knowledge about e-services’ reputations. Plainly aggregating rating comments from other clients can often be misleading. Therefore, we are presenting a reputation aggregation model which seeks for the subgroup of raters that (1) contains the largest degree of overall agreement and (2) derives the resulting reputation scores from their comments. We complement the reputation system by adding policies that promote both truth revelation for the rater commentary, and also competition over service quality between providers. Then we describe the recursive algorithm of the model, which judges a rater’s commenting quality based on the divergence between his and the other raters’ comments. This algorithm feeds the commenting quality back into the aggregation function as a weight on the rater’s comments. We evaluate our algorithm in simulations of three challenging threat scenarios. To this end, we show that our aggregation model can be effectively used to deselect weak rating accuracy, and to filter out a malicious collective of raters which makes up almost half of the rating population. Finally, we show that the model is able to reveal provider behavior that is biased in favor of specific groups of raters. 1 Motivation for a Reputation Reporting Service Clients choose a service provider on the basis of (1) price versus promised performance aspects such as, for example, a provider’s contractual terms in an SLAs and (2) the client’s confidence in how well this provider will deliver on the negotiated performance. Aggregating the confidence of many clients resembles the reputation of a provider. The importance of reputation in the clients’ decisions often is underestimated in the designs of electronic market places. This has been the case in, for example, Giovanetti and Ristuccia’s [6] analysis on the band-x backbone bandwidth market, where the researchers found that clients did not rely much on the reported performance numbers, but more so on the reputations of large, well-known providers. Introducing widely accepted reputation systems for eservices requires addressing a collection of pitfalls that are inherent to them. Most of these pitfalls are not of a technical nature. For technical policies such as “we want to keep commentators anonymous”, we are able to devise technical solutions. The questions that are difficult to address are in the nature of the design philosophy. For once, one needs a definition of the scales by which to measure the reputation of a provider a definition which includes the understanding of the pragmatic meaning of these scales. The reputation model we are presenting makes only one assumption about the choice of the reputation valuation function, namely that the reputation system operator has chosen an appropriate function. The question, then, arises: Can we trust the rating submissions from all raters? Should we give more weight to those raters who are seemingly more trustworthy than those who might be less well-informed? To address these questions, our model has the ability to redistribute the influence it assigns to a rater on the aggregated scores, and to do so in favor of certain better informed raters. In addition, we are able to produce different views of the reputation results, depending on which rater, or set of raters, we ex ante trust more. A separate threat model category exists on the side of the providers: Do they treat all their clients equally? Can we recognize provider discrimination? Can we distinguish discrimination behavior from biased client behavior? We address these different threat models by simulation them in challenging rating scenarios. Finally, do clients have some incentive to submit ratings at all? Do they have a tangible incentive state their experiences

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Non-native English Speaking Teachers’ Pragmatic Criteria in the Holistic and Analytic Rating of the Agreement Speech Act Productions of Iranian EFL Learners

Pragmatic rating is considered as one of the novel and crucial aspects of second language education which has not been maneuvered upon in the literature. To address this gap, the current study aimed to inspect the matches and mismatches, to explore rating variations, and to assess the rater consistency between the holistic and analytic rating methods of the speech act of agreement in L2 by non-...

متن کامل

Pragmatic Criteria in the Holistic and Analytic Rating of the Disagreement Speech Act of Iranian EFL Learners by Non-native English Speaking Teachers

onveying a strong message within a language stems from not only a linguistically appropriate utterance but also a pragmatically appropriate discourse. Broadly considering various facets of pragmatics, pragmatic assessment has not been potentially brought into perspective. To address this discourse gap, this study, guided by the principles of mixed-method design, pursued three purposes: ...

متن کامل

Developing Rating Scale Descriptors for Assessing the Stages of Writing Process: The Constructs Underlying Students' Writing Performances

The purpose of the present study is to develop appropriate scoring scales for each of the defined stages of the writing process, and also to determine to what extent these scoring scales can reliably and validly assess the performances of EFL learners in an academic writing task. Two hundred and two students’ writing samples were collected after a step-by-step process oriented essay writing ins...

متن کامل

A Study of Raters’ Behavior in Scoring L2 Speaking Performance: Using Rater Discussion as a Training Tool

The studies conducted so far on the effectiveness of resolution methods including the discussion method in resolving discrepancies in rating have yielded mixed results. What is left unnoticed in the literature is the potential of discussion to be used as a training tool rather than a resolution method. The present study addresses this research gap by exploring the data coming from rating behavi...

متن کامل

Monologic vs. Dialogic Assessment of Speech Act Performance: Role of Nonnative L2 Teachers’ Professional Experience on Their Rating Criteria

Few, if any, studies have investigated the effect of professional experience as a rater variable and type of assessment as a task variable on raters’ criteria in the assessment of speech acts. This study aimed to explore the impact of nonnative teachers’ professional experience on the use of criteria in monologic and dialogic assessment of 12 role-plays of 3 apology speech acts. To this end, 60...

متن کامل

The Impact of Raters’ and Test Takers’ Gender on Oral Proficiency Assessment: A Case of Multifaceted Rasch Analysis

The application of Multifaceted Rasch Measurement (MFRM) in rating test takers’ oral language proficiency has been investigated in some previous studies (e.g., Winke, Gass, & Myford, 2012). However, little research so far has ever documented the effect of test takers’ genders on their oral performances and few studies have investigated the relationship between the impact of raters’ gender on th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005